Auto-assemblage for Suffix Tree Clustering
نویسنده
چکیده
Due to explosive growth of extracting the information from large repository of data, to get effective results, clustering is used. Clustering makes the searching efficient for better search results. Clustering is the process of grouping of similar type content. Document Clustering; organize the documents of similar type contents into groups. Partitioned and Hierarchical clustering algorithms are mainly used for clustering the documents. In this paper, k-means describe the partitioned clustering algorithm and further hierarchical clustering defines the Agglomerative hierarchical clustering and Divisive hierarchical clustering. The paper presents the tool, which describe the algorithmic steps that are used in Suffix Tree Clustering (STC) algorithm for clustering the documents. STC is a search result clustering, which perform the clustering on the dataset. Dataset is the collection of the text documents. The paper focuses on the steps for document clustering by using the Suffix Tree Clustering Algorithm. The algorithm steps are display by the screen shots that is taken from the running tool. Keywords— Data Mining, Document Clustering, Hierarchical Clustering, Information Retrieval, Partitioned Clustering, Score Function, Similarity Measures, Suffix Tree Clustering, Suffix Tree Data model, Term Frequency and Inverse Document Frequency.
منابع مشابه
Suffix Tree Clustering - Data mining algorithm
Data Mining as a process of finding new, useful knowledge from data using different techniques. Using these techniques we getting faster and better search of large amounts of data that we facing every day. Clustering of data is one of the techniques that are used in data mining. Authors explore clustering algorithms and take suffix tree clustering algorithm for the best of them. Authors create ...
متن کاملClustering of Web Search Results Using Semantic
Clustering is related to data mining for information retrieval. Relevant information is retrieved quickly while doing the clustering of documents. It organizes the documents into groups; each group contains the documents of similar type content. Different clustering algorithms are used for clustering the documents such as partitioned clustering (K-means Clustering) and Hierarchical Clustering (...
متن کاملSemantic Suffix Tree Clustering
This paper proposes a new algorithm, called Semantic Suffix Tree Clustering (SSTC), to cluster web search results containing semantic similarities. The distinctive methodology of the SSTC algorithm is that it simultaneously constructs the semantic suffix tree through an on-depth and on-breadth pass by using semantic similarity and string matching. The semantic similarity is derived from the Wor...
متن کاملA New Cluster Merging Algorithm of Suffix tree Clustering
Document clustering methods can be used to structure large sets of text or hypertext documents. Suffix Tree Clustering has been proved to be a good approach for documents clustering. However, the cluster merging algorithm of Suffix Tree Clustering is based on the overlap of their document sets, which totally ignore the similarity between the non-overlap parts of different clusters. In this pape...
متن کاملSuffix Tree Clustering on Post-retrieval Documents
Clustering is used to divide a collection of data into groups based on similarity of objects. With respect to IR, document clustering has been studied. An information retrieval (IR) system would always return a list of retrieved documents to the user. The post-retrieval documents can be clustered in order to help users browse and navigate the searching results. For this purpose, Zamir and Etzio...
متن کامل